-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingest/apply-geolocation-rules: update general rules logic #55
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still wrapping my head around this, but first sending in a note that this doesn't work in some cases where it used to.
7de4dda
to
33a8979
Compare
Previously, we would backfill `rule_traversal` with wildcards ("*") if we could not find a matching rule for a particular `raw_geolocation`. This would NOT work for cases where there are partial rule matches AND wildcard rules that match. I realized the flaw in this logic while responding to @victorlin's post-merge review: #41 (comment) This commit updates the logic for when there are no matching rules. The `rule_traversal` is reset to the last index that is currently not a wildcard rule, and then change this value to a wildcard. This allows the recursive function to try different iterations of `rule_traversal` with different combinations of raw values and wildcards.
33a8979
to
0f6e1c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for the explanation and tests!
Co-authored-by: Victor Lin <13424970+victorlin@users.noreply.github.com>
Anything still blocking this from being merged? @joverlee521 @victorlin? |
There is one difference after running this, Ciudad Autonoma de Buenos Aires turns into Buenos Aires, I think that's on purpose, so I'll merge.
|
Thanks for checking in on this @corneliusroemer! Yes, this is an expected change of this PR since the updated logic now allows for these partial match rules to work:
(This is a rule pulled from ncov-ingest) |
Previously, we would backfill
rule_traversal
with "*" general rulesif we could not find a matching rule for a particular
raw_geolocation
.This would NOT work for cases where the "*" general rules preceded
actual geolocations such as "*/*/*/D". I realized the flaw in this logic
while responding to @victorlin's post-merge review:
#41 (comment)
This commit updates the logic to reset
rule_traversal
to the lastindex that is currently not a "*" general rule, then change this value
to a "*". This allows the recursive function to try different iterations
of
rule_traversal
with different combinations of raw values and "*"s.